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Abstract  -  Counter  Insurgency  operations  require  the 
ability  to  develop  accurate  representations  of  the 
physical  environment  and  the  human  landscape  in 
various  conditions  ( e.g urban  and  non-urban ,  day  and 
night ,  and  various  weather  conditions).  We  are 
developing  innovative  sensor  suites  and  processing 
techniques  suitable  for  such  domains  as  part  of  a  larger 
effort  to  support  human-centric  hard/soft  data  fusion.  In 
this  paper ,  we  present  a  sensor  suite ,  an  information 
processing  architecture ,  examples  of  the  resulting  fused 
information ,  and  future  experimental  designs.  These 
combined  resources  present  opportunities  for  creating 
rich  3-D  characterizations  of  the  environment  and  can 
support  novel  hybrid  human/computer  methods  for 
target  characterization ,  identification ,  and  tracking. 

Keywords:  counter  insurgency,  hard  data,  sensor  fusion, 
LIDAR,  SWIR,  MWIR,  3D,  synthetic  data. 

1  Introduction 

Counter  Insurgency  (COIN)  operations  require  the  ability 
to  develop  accurate  representations  of  the  physical 
environment  (e.g.,  terrain,  vegetation,  buildings,  roads, 
vehicles,  etc.)  as  well  as  the  human  landscape 
(individuals,  crowds,  etc.).  Such  operations  potentially 
involve  both  urban  and  non-urban  environments  and  must 
be  conducted  in  various  day/night  and  weather  conditions. 

Mixed  hard  and  soft  sensor  fusion  is  an  arising 
challenge  for  the  fusion  community.  Over  the  past  3 
decades,  we  have  made  substantial  progress  in  hard  sensor 
fusion  focusing  primarily  on  one-dimensional  signals  and 
the  analysis  of  kinematic  data  [1-5].  More  recent  work  has 
also  focused  on  the  fusion  of  logical  or  non-signals  based 
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data  with  some  success  [6-8].  At  the  same  time,  there  has 
been  an  explosion  of  results  on  natural  language 
processing  ([9-13]  are  but  a  small  sample  of  this  work). 
Integration  of  these  two  disparate  fields  to  create  a 
common  framework  for  the  analysis  of  COIN  operations 
may  represent  a  fundamental  shift  in  our  prosecution  of 
the  global  war  on  terror  and  peace  keeping  operations  in 
slowly  stabilizing  countries.  In  particular  it  could 
potentially  allow  us  to  integrate  and  understand  the 
myriad  of  sensor  information  collected  from  in  situ  and 
stand-off  sensor  systems  with  the  perpetually  updated 
information  collected  from  the  Internet  via  social  network 
sites  or  other  covert  collection  operations. 

Several  research  questions  arise  as  a  result  of  the 
hard  and  soft  sensor  fusion  problem.  Automated 
extraction  of  entities  and  geo-intelligence  is  required  to 
task  responders  for  action  as  a  result  of  text  and  image 
information.  Co-registration  of  geographic  features  in 
images  with  those  described  in  text  is  a  new  challenge  for 
the  sensor  fusion  community.  Historical  veracity  of  both 
soft  and  hard  sensor  feeds  must  be  addressed  before 
tasking  results  from  fusion  products.  Co-verification  from 
orthogonal  sources  like  hard  and  soft  sensors  represents  a 
unique  avenue  for  advancement  in  deception  detection. 
Processing  text  for  highly  ambiguous  terms  may  require 
human-interaction  in  the  absence  of  substantial  training 
data  sets,  which  may  not  be  available.  These  problems 
may  be  solvable  by  including  concepts  from  human 
computation  [e.g.,  14]  in  which  a  fusion  system  reduces 
the  number  of  hypotheses  available  and  a  human 
completes  the  fusion  process.  Detection  of  events  (verbs) 
inside  audio  /  video  streams  is  complex.  Integration  of 
audio  /  video  processing  with  text  /  sensing  making  will 
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be  key.  Verb  detection  in  images  is  challenging,  especially 
in  compressed  video. 

Currently,  we  are  focused  on  the  question  of 
devising  an  over-arching  architecture  for  the  integration  of 
hard  and  soft  sensor  fusion  and  the  specific  sub- 
architectural  features  necessary  for  hard  sensor  fusion  of 
multimodal  image  information.  In  the  following  sections, 
we  present  the  design  of  our  sensor  suite,  the  information 
processing  architecture,  COIN  operations  verification  and 
validation  scenarios,  and  the  derived  hard  fusion.  We 
conclude  by  outlining  future  work  with  respect  to  both  the 
current  work  and  the  larger  project. 

2  Sensor  Suite 

Four  criteria  drove  our  hard  sensor  suite  design 
(Figure  1).  First,  we  wanted  our  sensor  suite  to  have 
ecological  validity.  By  this,  we  mean  that  we  wanted  to 
select  sensors  representative  of  tactically  deployed  sensors 
[e.g.,  17].  Second,  we  only  selected  sensors  which  provide 
informational  “value  added”  to  the  inference  process  for 
our  selected  targets.  Third,  we  sought  out  sensors  which 
could  be  demonstrated;  namely,  those  which  could  be 
utilized  in  real  demonstrations  and  campus-based 
experiments.  Fourth,  we  wanted  at  least  one  sensor  that 
could  allow  for  innovation  in  the  hard  sensor  fusion 
processing  flow. 


Value-Added  I  Demonstrability 


Figure  1:  Sensor  Suite  Design  Criteria 

The  result  of  this  selection  was  the  following  suite:  Light 
Detection  and  Ranging  (LIDAR)  which  operates  in  the 
Short- Wavelength  Infrared  (SWIR)  band,  combined  with 
Mid- Wavelength  Infrared  (MWIR),  visual  video,  and 
acoustic  sensors.  The  fusion  applications  that  will  be 
demonstrated  in  this  paper  combined  the  Flash  LIDAR 
with  a  MWIR  sensor.  3D  Flash  LIDAR  uses  an  array  of 
independent  LIDAR  receivers  and  focuses  light  returning 
from  the  scene  onto  the  array  with  a  lens  system.  In  many 
ways  it  has  a  user  aesthetic  which  is  exactly  like  any 
ordinary  digital  video  camera.  The  flash  is  generated  by 
an  on-board  laser  module  and  a  beam- spreading  optical 
element.  This  is  analogous  to  flash  photography  with 
conventional  2D  digital  video  cameras.  Since  all  pixels 
function  in  parallel  to  each  other  the  motion  of  the 
platform  and  motion  of  the  scene  between  pixels 
samplings  is  zero.  The  3D  flash  LIDAR  camera  chosen 


generates  relatively  noise-free  point  cloud  videos  at 
ranges  up  to  1.5  kilometers  and  at  frame  rates  up  to  30  Hz. 
The  laser  wavelength  of  the  camera  is  1.57  /im  and  is 
considered  eye-safe.  The  laser  is  pulsed  for  only  5  ns  per 
video  frame.  The  entire  camera  unit  is  approximately  1 1  x 
6x6  inches  in  size,  weighs  approximately  10  lbs  and  can 
be  powered  by  a  standard  110V  wall  outlet  or  by  a  12V 
motorcycle  battery.  Mid-wavelength  infrared  (MWIR,  IR- 
C  DIN)  is  also  called  intermediate  infrared  (HR):  3-8  pm. 
The  3  to  5  micron  band  is  defined  by  the  atmospheric 
window  and  covered  by  Indium  antimonide  [InSb]  and 
HgCdTe  and  partially  by  lead  selenide  [PbSe]).  In  guided 
missile  technology  the  3-5  pm  portion  of  this  band  is  the 
atmospheric  window  in  which  the  homing  heads  of 
passive  IR  'heat  seeking'  missiles  are  designed  to  work, 
homing  on  to  the  IR  signature  of  the  target  aircraft, 
typically  the  jet  engine  exhaust  plume.  The  assembled 
system  is  illustrated  in  Figure  2  and  the  specifications  are 
listed  in  Table  1. 


Figure  2:  Sensor  Suite 


Tablet:  Sensor  Specifications 


Sensor 

FOV 

(deg) 

Pixel 

Pitch 

(um) 

FPA 

Dimensions 

Frame 

Rate 

(Hz) 

Flash 

Lidar 

(SWIR) 

3 

100 

128 x 128 

20 

MWIR 

2.23 

30 

256x256 

60 

3  Information  Processing  Architecture 

Our  proposed  information  flow  architecture  is  shown  in 
Figure  3.  Both  hard  and  soft  sensor  information  is 
collected  and  can  be  individually  fused  to  create  an 
intermediate  fusion  product.  In  the  case  of  soft  sensing, 
this  may  be  combined  by  key  phrase  detection,  entity 
extraction  or  common  text  area  identification.  In  the  case 
of  hard  sensing  this  may  be  fused  LIDAR  and  infrared 
images  or  time  synchronization  of  video  with  still  camera 
information. 
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Hard  Sensing  Soft  Sensing 

(Images,  LIDR,  Acoustic)  (Chat,  Reports,  Blogs) 


Figure  3:  Proposed  hard  /  soft  sensor  fusion  architecture 

These  two  fusion  products  can  then  be  fed  into  a  hard  / 
soft  sensor  fusion  system.  The  integration  of  soft  with 
hard  fused  data  will  require  the  use  of  an  ontological 
processing  entity  to  allow  verbs  or  nouns  in  the  text  to  be 
identified  within  the  images  [15,  16].  Unfortunately,  the 
inclusion  of  ontologies  may  make  the  system  substantially 
more  brittle  and  difficult  to  update.  Therefore,  a  looser 
extensible  dictionary  of  terms  that  can  be  detected  in 
images  may  be  substantially  more  effective  than  a  full 
ontology.  Humans-in-the-loop  may  be  present  at  all  levels 
of  the  fusion  activity.  These  humans  can  act  as  proxies  for 
algorithms  or  can  serve  to  disambiguate  algorithm 
confusion  associated  with  e.g.,  ambiguous  terms  in  text 
analysis  or  corrupt  images  in  hard  sensor  fusion.  In  the 
following  sections,  we  focus  on  the  fusion  of  mixed  image 
intelligence  data,  thereby  providing  details  behind  the 
components  that  might  function  in  the  left-hand-side  (hard 
sensor  fusion  component)  of  the  proposed  fusion 
architecture. 

4  Sensor  Fusion  Scenarios 

To  support  the  development  and  testing  of  our  sensor 
suite,  information  processing  architecture,  and  data  fusion 
capabilities,  we  have  developed  a  series  of  scenarios  that 
realistically  simulate  both  urban  environments  (in  which 
targets  might  be  obscured  by  walls,  smoke,  crowds,  etc) 
and  wooded  environments  (where  thick  foliage  can 
interfere  with  sensing  and  observations). 

4.1  Scenario  1  -  Urban 

In  the  first  scenario,  a  simulated  urban  area  is  populated 
with  moving  vehicles  and  simulated  innocent  bystanders 
and  pedestrians.  The  “Blue  team”  of  agents  patrols  the 
area  and  reports  their  findings  using  communication 
devices  (COMM).  The  Blue  team  also  uses  a  Flash 
LIDAR  sensor  bore-sighted  with  a  MWIR  sensor  with  the 
capability  to  pan  and  tilt  as  needed. 


Figure  4:  Urban  Scenario  Environment 


The  “Red  team”  consists  of  a  group  of  individuals  who 
drive  to  a  building  (Building  1  in  Figure  5)  with  a  central 
location  within  a  populated  area.  Two  of  the  individuals 
on  the  Red  team  leave  the  vehicle  and  proceed  to  take 
positions  in  second-story  windows  in  that  building.  The 
Red  team  vehicle  then  proceeds  to  a  nearby  location 
(behind  Building  2  in  Figure  5)  that  people  would  be 
likely  to  flee  to  in  order  to  take  cover  in  the  event  of  a 
shooting.  While  this  occurs,  the  data  feeds  from  the 
LIDAR,  MWIR,  and  COMM  channels  are  being 
monitored  and  recorded  by  the  (Blue  team)  data  fusion 
system.  The  two  Red  team  individuals  in  window 
locations  in  the  building  open  fire  on  the  crowd  below. 
When  this  occurs,  the  crowds  take  cover  near  the 
abandoned  Red  team  vehicle,  which  is  then  detonated. 


Figure  5:  Urban  Scenario 

The  scenario  is  described  above  in  the  nominal  case  as  it 
is  intended  to  occur  by  the  Red  team.  In  certain  instances, 
it  can  play  out  differently  if  the  data  fusion  system  is  able 
to  detect  the  threat  and  recommend  an  alternate  course  of 
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action  for  the  Blue  team  or  direct  a  human  analyst’s 
attention  to  information  that  would  help  them  arrive  at  a 
preferable  Blue  team  action  plan. 

This  scenario  includes  the  potential  for  challenges 
such  as  dense  smoke  to  impede  vision,  crowds  of  innocent 
people  to  make  identification  of  Red  team  members  more 
difficult,  and  vehicular  traffic  to  add  additional 
complexity.  This,  combined  with  realistic  simulated  radio 
COMM,  provides  an  ideal  opportunity  to  exploit  the 
capabilities  of  LIDAR  technology  and  hard/soft  sensor 
fusion  techniques. 

4.2  Scenario  2  -  Dense  Vegetation 

In  the  second  scenario,  we  simulate  a  planned  improvised 
explosive  device  (IED)  ambush  attack  in  a  densely 
wooded  area.  In  this  scenario,  the  Blue  team  is  patrolling 
the  area  with  the  goal  of  protecting  a  convoy.  Meanwhile, 
the  Red  team  has  planted  a  roadside  IED  and  is  planning 
to  ambush  the  Blue  team  convoy.  While  this  situation 
unfolds,  fused  Flash  LIDAR  and  LWIR  data  feeds  and 
COMM  dialog  between  Blue  team  members  are  streamed 
to  the  data  fusion  system  for  analysis. 


Figure  6:  Dense  Vegetation  Environment 

Much  like  in  the  urban  scenario,  the  challenges  faced  by 
the  Blue  team  in  this  scenario  leverage  the  capabilities  of 
LIDAR  as  well  as  the  utility  of  hard/soft  data  fusion 
techniques.  In  this  case,  the  dense  vegetation  of  the 
surrounding  area  combined  with  the  linear  approach  of  the 
convoy  along  the  road  and  the  unconstrained  movement  of 
the  Red  Ambush  Team  and  Blue  Patrol  Team  provides  a 
particularly  interesting  sensing  and  data  fusion  challenge. 


Figure  7:  Dense  Vegetation  Scenario 


5  Hard  Fusion  of  LIDAR  &  IMINT 

Image  fusion  across  modalities  is  a  challenging  problem 
from  accurate  registration  to  meaningful  representation  of 
the  fused  information.  A  fused  product  must  convey  the 
important  information  from  each  modality  in  a  way  that 
can  be  naturally  interpreted  by  a  human  observer.  We 
propose  a  method  of  fusing  3D  range  information  from  a 
Flash  LIDAR  with  a  thermal  MWIR  image  to  convey  the 
location  of  objects  of  interest  within  the  focal  plane  and 
naturally  within  3-space.  The  fusion  method  makes  use  of 
human  visual  perception  of  color  and  brightness  to  convey 
range  and  temperature,  respectively. 


Figure  8:  A  Flash  LIDAR  range  image  mapped  to  8  color 
bins  within  the  ranges  of  280  ft.  to  370  ft. 

Given  a  range  image  R  directly  mapped  to  ortho-rectified 
(x,y,z)  points  in  3-space,  and  a  thermal  image  T,  assumed 
to  be  registered  pixel-by-pixel  to  the  range  image,  a  fused 
image  may  be  constructed  using  data  from  each  source. 
Given  a  colormap  MR  with  n  bins,  MR:  R  — »  Rn  divides 
the  range  image  into  n  colors  where  Rn  G  Mpx3  for  p 
pixels  (Figure  8).  Each  row  of  Rn  provides  a  red,  green, 
and  blue  value  to  define  the  pixel  color.  Given  an 
intensity  map  with  k  bins,  Mji  I  — >  Ik  discretizes  the 
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intensity  image  into  k  bins  where  /k  C  and  each 

/k(z)  E  [0,1],  i=l,...,p  (Figure  9).  The  Fused  image  F  is 
defined  as  [/k/fc/k]  °  A/^,  where  -'denotes  the  entry-wise 
product.  This  scales  the  intensity  of  the  pixel  colors  by  the 
intensity  of  the  thermal  image  (Figure  10). 


Figure  9:  A  MWIR  thermal  image  mapped  to  256  bins 
with  values  in  [0  1] 


6  Conclusions  and  Future  Work 

In  this  paper,  we  presented  a  hard/soft  information  fusion 
architecture,  a  hard  sensor  suite  (accompanied  with  the 
rationale  driving  the  design),  COIN-inspired  scenarios  for 
driving  our  empirical  analyses,  and  the  hard  sensor  fusion 
algorithms  and  resulting  human-centered  information 
products. 

As  we  continue  with  this  work,  we  will:  (1)  conduct 
experiments  centered  on  the  scenarios  described,  (2) 
explore  the  algorithmic  design  space  that  arises  from  the 
resulting  data,  and  (3)  investigate  approaches  to  the  fusion 
of  our  hard  and  soft  data  sets. 

The  two  primary  locations  for  our  planned  experiments 
are  the  Penn  State  campus  at  University  Park,  PA  and  a 
fire  safety  facility  located  nearby.  The  Penn  State  campus 
is  ideal  for  using  non-threatening  analogous  scenarios  to 
investigate  human-in-the-loop  issues  such  as  knowledge 
elicitation  from  participating  observers,  dynamics  of 
centralized  vs.  distributed  command,  motivation  of  human 
observers,  and  team  cognition.  Additionally,  the  Extreme 
Events  Lab  [18]  located  on  the  Penn  State  campus  can 
serve  as  a  command  center  for  these  experiments.  The 
fire  safety  facility  serves  as  an  ideal  location  for  the  hard 
sensor  experiments  described  above. 

Data  fusion  is  in  many  ways  a  design  science.  Rather 
than  just  developing  algorithms,  we  must  seek  out  a 
deeper  understanding  of  the  nature  of  our  algorithms. 
Indeed,  the  selection  of  algorithmic  approaches  cannot  be 
centered  solely  on  the  efficacy  of  a  given  algorithmic 


Figure  10:  Two  views  of  the  fused  data  showing  only 
points  between  75  m.  and  115  m.  The  human  has  a  higher 
temperature  than  the  background  resulting  in  more  vibrant 
colors  that  clearly  indicate  his  range  at  100m. 

approach,  but  rather  on  the  competing  alternative  trade¬ 
offs.  How  does  a  given  algorithmic  approach  support,  fail 
to  support,  or  even  undermine  our  goals  by  filling,  failing 
to  fill,  or  undermining  the  information  needs  - 
information  delivery  -  sensemaking  -  action  cycle? 

The  SYNCOIN  dataset  [19]  has  been  constructed  by 
Penn  State  researchers  in  order  to  provide  a  substantial  set 
of  synthetic  soft  data  with  corresponding  hard  sensor  data 
opportunities.  SYNCOIN  contains  approximately  600 
messages  that  represent  COIN-inspired  scenarios. 
Additionally,  “ground  truth”  and  pedigree  metadata 
documents  are  maintained  for  all  messages  and  their 
corresponding  threads  of  interest.  The  activities  and 
experiments  described  in  this  paper  will  complement 
SYNCOIN  by  providing  relevant  hard  sensor  datasets. 
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This  combination  of  a  new  sensor  suite,  realistic  synthetic 
hard  and  soft  data,  relevant  metadata/ground  truth 
documents,  and  the  capability  to  perform  human-in-the- 
loop  experimentation  represent  an  opportunity  to  break 
new  ground  in  human-centric  hard  and  soft  information 
fusion. 
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